[Bugfix] [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes#15090
[Bugfix] [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes#15090masahi merged 3 commits intoapache:mainfrom lecoan:fix/plan_device
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
|
@mbs-octoml @Lunderberg Hi, all the test cases passed. Can you help me review the patch? |
| * | ||
| * Phase 1 | ||
| * ------- | ||
| * We iterate process the programs to find those nodes with conflicting virtual devices. If the |
There was a problem hiding this comment.
Thanks for your review! My apologies for the spelling errors. I have double-checked the comment using ChatGPT to ensure accuracy.
| }; | ||
|
|
||
| /*! | ||
| * \brief Flows the device constraints over the module and find all the conflicted nodes. The |
There was a problem hiding this comment.
My apologies for the spelling errors. I have double-checked the comment using ChatGPT to ensure accuracy.
| } | ||
|
|
||
| IRModule mod_; | ||
| std::unique_ptr<DeviceContext> dev_ctx_; |
There was a problem hiding this comment.
The reason is similar to the PlanDevicesCore sub-pass, which uses a pointer for DeviceDomains to prevent unnecessary copying. Since the necessary information is contained in dev_ctx_, which is created in ConflictedNodeFinder and then passed to ConflictedNodeRewriter, we also use a pointer here.
| * | ||
| * Phase 1 | ||
| * ------- | ||
| * We iterately process the programs and find nodes with conflicting virtual devices. If the |
This PR addresses an issue #15019 I opened previously, regarding the PlanDevices pass's failure in cases where two operators share the same input but are intended to be assigned to different target devices. This scenario can often occur in the context of a neural network, where multiple layers can process the same input.
In the specific case of an operation
(a+b)*(b+c), where the first add operator is assigned to the CPU and the second one to the GPU, PlanDevices pass would fail as it had difficulty determining the correct device for b.The problematic behavior seemed to be due to PlanDevices pass marking
bfor the CPU (when it first visitsa+b), and then throwing an error when it attempts to place b on the GPU while visitingb+c.The solution I've implemented in this PR is the automatic addition of a
device_copy. This means that ifbis assigned to CPU after visitinga+b, the PlanDevices pass will append adevice_copyto copybto GPU when visitingb+c.